Listen Top Shows Blog

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Update: 2024-11-07

Share

Description

Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.

Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:

Key Insights:

Multi-tier storage strategy:
- GPU memory (1% of data, fastest)
- RAM (10% of data)
- Local SSD
- Object storage (slowest but cheapest)
Real-time search solution:
- New data goes to buffer (searchable immediately)
- Index builds in background when buffer fills
- Combines buffer & main index results
Performance optimization:
- GPU acceleration for 10k-50k queries/second
- Customizable trade-offs between:
  - Cost
  - Latency
  - Search relevance
Future developments:
- Self-learning indices
- Hybrid search methods (dense + sparse)
- Graph embedding support
- Colbert integration

Perfect for teams hitting scaling walls with their current vector search implementation or planning for future growth.

Worth watching if you're building production search systems or need to optimize costs vs performance.

Charles Xie:

Nicolay Gerold:

00:00 Introduction to Search System Challenges 00:26 Introducing Milvus: The Open Source Vector Database 00:58 Interview with Charles: Founder of Zilliz 02:20 Scalability and Performance in Vector Databases 03:35 Challenges in Distributed Systems 05:46 Data Consistency and Real-Time Search 12:12 Hierarchical Storage and GPU Acceleration 18:34 Emerging Technologies in Vector Search 23:21 Self-Learning Indexes and Future Innovations 28:44 Key Takeaways and Conclusion

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

2024-11-0736:26

Search Systems at Scale: Avoiding Local Maxima and Other Engineering Lessons | S2 E12

Search Systems at Scale: Avoiding Local Maxima and Other Engineering Lessons | S2 E12

2024-10-3154:47

Training Multi-Modal AI: Inside the Jina CLIP Embedding Model | S2 E11

Training Multi-Modal AI: Inside the Jina CLIP Embedding Model | S2 E11

2024-10-2549:22

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

2024-10-2344:54

Numbers, categories, locations, images, text. How to embed the world? | S2 E9

Numbers, categories, locations, images, text. How to embed the world? | S2 E9

2024-10-1046:44

Building Taxonomies: Data Models to Remove Ambiguity from AI and Search | S2 E8

Building Taxonomies: Data Models to Remove Ambiguity from AI and Search | S2 E8

2024-10-0458:40

From PDFs to Pixels: How ColPali is Changing Information Retrieval | S2 E7

From PDFs to Pixels: How ColPali is Changing Information Retrieval | S2 E7

2024-09-2754:57

Beyond Embeddings: The Power of Rerankers in Modern Search | S2 E6

Beyond Embeddings: The Power of Rerankers in Modern Search | S2 E6

2024-09-2642:29

Limits of Embeddings: Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) | S2 E5

Limits of Embeddings: Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) | S2 E5

2024-09-1946:06

RAG at Scale: The problems you will encounter and how to prevent (or fix) them | S2 E4

RAG at Scale: The problems you will encounter and how to prevent (or fix) them | S2 E4

2024-09-1250:09

From Keywords to AI (to GAR): The Evolution of Search, Finding Search Signals | S2 E3

From Keywords to AI (to GAR): The Evolution of Search, Finding Search Signals | S2 E3

2024-09-0552:16

Data-driven Search Optimization, Analysing Relevance | S2 E2

Data-driven Search Optimization, Analysing Relevance | S2 E2

2024-08-3051:14

Query Understanding: Doing The Work Before The Query Hits The Database | S2 E1

Query Understanding: Doing The Work Before The Query Hits The Database | S2 E1

2024-08-1553:02

Season 2 Trailer: Mastering Search

Season 2 Trailer: Mastering Search

2024-08-0804:16

Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

2024-07-1636:28

Data Processing for AI, Integrating AI into Data Pipelines, Spark | ep 16

Data Processing for AI, Integrating AI into Data Pipelines, Spark | ep 16

2024-07-1246:26

Building AI Agents for the Enterprise: Realistic Use Cases, Cost Controls, Seamless UX | ep 15

Building AI Agents for the Enterprise: Realistic Use Cases, Cost Controls, Seamless UX | ep 15

2024-07-0435:12

Building Predictable Agents: Prompting, Compression, and Memory Strategies | ep 14

Building Predictable Agents: Prompting, Compression, and Memory Strategies | ep 14

2024-06-2732:14

Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3

Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3

2024-06-2514:53

ETL for LLMs, Integrating and Normalizing Unstructured Data | ep 13

ETL for LLMs, Integrating and Normalizing Unstructured Data | ep 13

2024-06-1936:48

00:00

00:00

1.0x

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Nicolay Gerold